Reestimation and Best-First Parsing Algorithm for Probabilistic Dependency Grammars

نویسندگان

  • Seungmi Lee
  • Key-Sun Choi
چکیده

This paper presents a reesthnation algorithm and a best-first parsing (BFP) algorithm for probabilistic dependency grummars (PDG). The proposed reestimation algorithm is a variation of the inside-outside algorithm adapted to probabilistic dependency grammars. The inside-outside algorithm is a probabilistic parameter reestimation algorithm for phrase structure grammars in Chomsky Normal Form (CNF). Dependency grammar represents a sentence structure as a set of dependency links between arbitrary two words in the sentence, and can not be reestimated by the inside-outside algorithm directly. In this paper, non-constituent objects, complete-llnk and complete-sequence are defined as basic units of dependency structure, and the probabilities of them are reestimated. The reestimation and BFP algorithms utilize CYK-style chart and the nonconstituent objects as chart entries. Both algoritbrn~ have O(n s) time complexities. 1 I n t r o d u c t i o n There have been many efforts to induce grammars automatically from corpus by utilizing the vast amount of corpora with various degrees of annotations. Corpus-based, stochastic grammar induction has many profitable advantages such as simple acquisition and extension of linguistic knowledges, easy treatment of ambiguities by virtue of its innate scoring mechanism, and fail-soi~ reaction to ill-formed or extra-grammatical sentences. Most of corpus-based grammar inductions have concentrated on phrase structure gram° mars (Black, Lafferty, and Roukos, 1992, Lari and Young, 1990, Magerman, 1994). The typical works on phrase structure grammar induction are as follows(Lari and Young, 1990, Carroll, 1992b): (1) generating all the possible rules, (2) reestimating the probabilities of rules using the inside-outside algorithm, and (3) finally finding a stable grammar by eliminating the rules which have probability values close to 0. Generating all the rules is done by restricting the number of nonterminals and/or the number of the right hand side symbols in the rules and enumerating all the possible combinations. Chen extracts rules by some heuristics and reestimates the probabilities of rules using the inside-outside algorithm (Chen, 1995). The inside-outside algorithm learns a grammar by iteratively adjusting the rule probabilities to minimize the training corpus entropy. It is extensively used as reestimation algorithm for phrase structure grammars. Most of the works on phrase structure grammar induction, however, have partially succeeded. Estimating phrase structure grammars by minimizing the training corpus on-

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Framework for Unsupervised Dependency Parsing using a Soft-EM Algorithm and Bilexical Grammars

Unsupervised dependency parsing is acquiring great relevance in the area of Natural Language Processing due to the increasing number of utterances that become available on the Internet. Most current works are based on Dependency Model with Valence (DMV) [12] or Extended Valence Grammars (EVGs) [11], in both cases the dependencies between words are modeled by using a fixed structure of automata....

متن کامل

Unsupervised Bayesian Parameter Estimation for Dependency Parsing

We explore a new Bayesian model for probabilistic grammars, a family of distributions over discrete structures that includes hidden Markov models and probabilitsic context-free grammars. Our model extends the correlated topic model framework to probabilistic grammars, exploiting the logistic normal prior as a prior over the grammar parameters. We derive a variational EM algorithm for that model...

متن کامل

Logistic Normal Priors for Unsupervised Probabilistic Grammar Induction

We explore a new Bayesian model for probabilistic grammars, a family of distributions over discrete structures that includes hidden Markov models and probabilistic context-free grammars. Our model extends the correlated topic model framework to probabilistic grammars, exploiting the logistic normal distribution as a prior over the grammar parameters. We derive a variational EM algorithm for tha...

متن کامل

Parsing with Probabilistic Grammars

This paper describes some recent developments in the area of natural language parsing. Probabilistic Grammars are, in principle, grammars enriched with probabilities to distinguish between probable and improbable analyses of a sentence. The first part of the paper introduces the notation of Probabilistic Context-Free Grammars, together with a general algorithm for PCFG top-down parsing and some...

متن کامل

Bilexical Grammars and a Cubic-time Probabilistic Parser

Computational linguistics has a long tradition of lexicalized grammars, in which each grammatical rule is specialized for some individual word. The earliest lexicalized rules were word-specific subcategorization frames. It is now common to find fully lexicalized versions of many grammatical formalisms, such as context-free and tree-adjoining grammars [Schabes et al. 1988]. Other formalisms, suc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997